This presentation focuses on predicting two variables in our Beer Sales data: Channel and Spend Per Trip.

For our Channel predictions, we’ve built Random Forest and Support Vector models with an accuracy of ~0.6. This seems good at first, but we’re having difficulties getting the model to predict cases where the channel is equal to “food” (i.e., Grocery Stores). We believe continued development of these models will rectify this issue.

For our Spend Per Trip predictions, we’ve built Lasso and Neural Network models. Both models have an R squared of 0.8 and 0.85 respectively, which shows the models are doing a good job of predicting the our customer’s spend based on the attributes of their trip.

model accuracy sensitivity specificity precision rmse mae rsq r2
Regression Neural Network NA NA NA NA 3.03 0.96 0.86 0.86
Classification Random Forest Tuned 0.61 0.07 0.98 0.65 NA NA NA NA
Classification Support Vector 0.60 0.02 0.99 0.59 NA NA NA NA
Regression Lasso Tuned NA NA NA NA 3.35 1.05 0.81 0.81
The Project
The Problem Description

This project is focused on a beer sales dataset that captures detailed customer transactions, specifically focusing on beer purchases from October 2013 to October 2014. My primary goal is to develop models that effectively identify patterns within these transactions.

For the categorical variable I want to predict, I am going with “channel”, which indicates where the food was purchased. I am filtering this to two channels, “food” (i.e., grocery stores) and “mass” (i.e., Walmart, Target, Costco).

For the continuous variable, I’m trying to predict the “spend_per_trip”. This indicates how much the customer spent in total on their trip to the store.

Data Shape

The raw dataset has 184,973 rows and 12 columns, there are two ID columns which will not be used for my models.

Data Transformations

The categorical columns all have a high cardinality so I filtered the two largest sales channels, food and mass. This resulted in a dataset with 149,718 rows and 12 columns

The initial dataset contains multiple rows if a customer bought more than 1 type of item on their trip. Due to this, I created two datasets to capture information about trips and customers.

Data Source

DU MSBA Alumni Andrew Brooks

Variable Descriptions

TO PREDICT WITH
* Trip Date: The date of the User’s trip.
* Retailer: The name of the venue where the sale takes place (e.g., Walmart, Kroger, and Safeway).
* Banner: The sub-category of Retailer, if there’s a subtype available.
* Parent Brand: Name of the parent company that is selling the beer (e.g., Bud Light, Miller, and Coors).
* Brand: Name of the brand of beer that is being sold.
* Item Description: The type of item that is being sold, like a 12 or 24 pack.
* Units Per Trip: How many of the item were bought.
* Spend Per Unit: How much money was spent per unit that was bought.
* Basket Size (Units): How many items were in the User’s basket on that trip.
* Basket Size ($): The monetary value of the User’s basket on the trip

WE WANT TO PREDICT
* Channel: The type of venue where the trip takes place (e.g., Bodega, Fast Food, Drug Store).
* Spend Per Trip: How much money was spent on the trip on Beer.

Preview of the data
channel units_per_trip spend_per_unit spend_per_trip basket_size_units basket_side_dollars trip_date_quarter trip_date_month is_weekend is_weekday season day_of_week spend_category
165998 mass 1 16.64 16.64 3 74.77 2 4 1 0 Spring Sunday High
153947 mass 1 11.99 11.99 21 125.89 3 7 0 1 Summer Thursday Medium
128031 mass 1 4.77 4.77 17 69.37 1 2 1 0 Winter Saturday Low
154524 mass 1 6.19 6.19 25 177.75 1 2 1 0 Winter Saturday Low
57586 food 1 7.49 7.49 21 68.68 2 5 1 0 Spring Saturday Low
82002 mass 1 15.97 15.97 12 74.83 4 11 1 0 Fall Saturday Medium
Decision Tree Model

In this decision tree model, we’re trying to determine which of the two largest channels a consumer purchased their beer at. An accuracy of 0.6 reflects some positive initial results of this model’s performance, but looking at the sensitivity it’s doing a poor job of predicting when a row was at a “mass” channel. A sensitivity of 0.976 tells me that it’s doing really well at the “food” channel, but it also tells me the model is just picking “food” for most rows. For next steps, I’d say we could look at increasing the importance of the “mass” prediction in our model training.

.metric .estimate
accuracy 0.613
sensitivity 0.067
specificity 0.976
precision 0.649
          Truth
Prediction  mass  food
      mass  1615   874
      food 22332 35067
Support Vector Classifier

The performance between this Support Vector Classifier and the Random Forest model are remarkably similar, with a slight loss in precision for the SVC model.

.metric .estimate
accuracy 0.602
sensitivity 0.018
specificity 0.992
precision 0.590
          Truth
Prediction  mass  food
      mass   435   302
      food 23512 35639
Model Summary

In this model we’re trying to predict the total spent on a trip by a customer using a Lasso model.

[1] "The lowest rmse Lasso penalty is 0.01"
model accuracy sensitivity specificity precision rmse mae rsq r2
Classification Random Forest Tuned 0.613 0.067 0.976 0.649 NA NA NA NA
Regression Lasso Tuned NA NA NA NA 3.353 1.047 0.811 0.811
Classification Support Vector 0.602 0.018 0.992 0.590 NA NA NA NA
Regression Neural Network NA NA NA NA 3.032 0.958 0.856 0.856
Model Output
Model Output

Classification Models Both models suffer greatly in the sensitivity measure. This indicates that these models are struggling to classify the “mass” channels, while most cases just predicting “food”, thus the high specificity.

Regression Models Both models perform similarly, but I would lean towards using the Random Forest model for it’s slightly superior performance

model accuracy sensitivity specificity precision rmse mae rsq r2
Regression Neural Network NA NA NA NA 3.03 0.96 0.86 0.86
Classification Random Forest Tuned 0.61 0.07 0.98 0.65 NA NA NA NA
Classification Support Vector 0.60 0.02 0.99 0.59 NA NA NA NA
Regression Lasso Tuned NA NA NA NA 3.35 1.05 0.81 0.81
ROC Curves
Conclusion

Our Spend Per Trip models are doing a good job of predicting our customers Spend Per Trip, both with r squared values above 0.80. I would recommend continuing to explore these models and try to understand which predictors have the greatest impact on the model’s predictions.

Our Channel models present a greater challenge. There is a slight class imbalance, but our model’s specificity (predicting “food”) is very low, usually around or below 0.05. This means the models are primarily just predicting that the Channel is equal to “mass” in the vast majority of cases. For next steps, I would exactly split the training dataset 50/50 for mass and food and see the results. Alternatively, you could look at methods for increasing the importance of the “food” predictions and see if the model adjusts.

I am most proud of the work I did to integrate my old Python code into this project. I see this as laying the groundwork for incorporating R in my future Python projects. Working through this class, I really like a lot of R’s features and syntax and I could see myself working with R in a project for data prep and exploration.

If given another week, I would spend the time on refining the four predictive models in this report. I certainly felt a lot of crunch with the capstone going on at the same time as well as my full-time job.